Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds. Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation. Prior works either address this issue by post-processing or jointly learn object boundaries to implicitly improve feature encoding of the networks. These approaches often require additional modules which are difficult to integrate into the original architecture. To improve the segmentation near object boundaries, we propose a boundary-aware feature propagation mechanism. This mechanism is achieved by exploiting a multi-task learning framework that aims to explicitly guide the boundaries to their original locations. With one shared encoder, our network outputs (i) boundary localization, (ii) prediction of directions pointing to the object's interior, and (iii) semantic segmentation, in three parallel streams. The predicted boundaries and directions are fused to propagate the learned features to refine the segmentation. We conduct extensive experiments on the S3DIS and SensatUrban datasets against various baseline methods, demonstrating that our proposed approach yields consistent improvements by reducing boundary errors. Our code is available at https://github.com/shenglandu/PushBoundary.
translated by 谷歌翻译
在存在白噪声的情况下,在各个科学领域,在存在白噪声的情况下逃脱吸引盆地的平均退出时间至关重要。在这项工作中,我们提出了一种策略,以控制一般随机动力学系统的平均退出时间,以基于准潜电概念和机器学习实现所需的价值。具体而言,我们开发了一个神经网络体系结构来计算全局准次电位函数。然后,我们设计了一种系统的迭代数值算法来计算给定平均退出时间的控制器。此外,我们在有效的汉密尔顿 - 雅各比计划和受过训练的神经网络的帮助下确定了亚稳态吸引子之间的最可能路径。数值实验表明,我们的控制策略是有效且足够准确的。
translated by 谷歌翻译
对于基于骨架的动作识别中的当前方法通常是将长期时间依赖性作为骨骼序列捕获通常长的(> 128帧),这很常见,这对于先前的方法构成了一个具有挑战性的问题。在这种情况下,短期依赖性很少被正式考虑,这对于对类似动作进行分类至关重要。大多数当前的方法包括相互交织的仅空间模块和仅时间的模块,在这些模块中,在相邻框架中的关节之间的直接信息流受到阻碍,因此不如捕获短期运动并区分相似的动作对。为了应对这一限制,我们提出了一个作为stgat创造的一般框架,以建模跨天空信息流。它使仅空间模块与区域感知的时空建模相称。尽管STGAT在理论上对时空建模具有有效性,但我们提出了三个简单的模块,以减少局部时空特征冗余,并进一步释放STGAT的潜力,(1)(1)自我关注机制的范围,(2)动态重量的范围(2)沿时间尺寸的关节和(3)分别与静态特征分开的微妙运动。作为一个可靠的特征提取器,STGAT在对以前的方法进行分类时,在定性和定量结果中都证明了相似的动作。 STGAT在三个大规模数据集上实现了最先进的性能:NTU RGB+D 60,NTU RGB+D 120和动力学骨架400。释放了代码。
translated by 谷歌翻译
如何提取重要点云特征并估计它们之间的姿势仍然是一个具有挑战性的问题,因为点云的固有缺乏结构和暧昧的顺序排列。尽管对大多数3D计算机视觉任务的基于深度学习的方法进行了重大改进,例如对象分类,对象分割和点云注册,但功能之间的一致性在现有的基于学习的流水线上仍然没有吸引力。在本文中,我们提出了一种用于复杂对准场景的新型学习的对齐网络,标题为深度特征一致性,并由三个主模块组成:多尺度图形特征合并网络,用于将几何对应集转换为高维特征,对应加权用于构建多个候选内部子集的模块,以及命名为深度特征匹配的Procrustes方法,用于给出闭合方案来估计相对姿势。作为深度特征匹配模块的最重要步骤,构造每个Inlier子集的特征一致性矩阵以获得其主要向量作为相应子集的含义似然性。我们全面地验证了我们在3DMATCH数据集和基提ODOMOTRY数据集中的方法的鲁棒性和有效性。对于大型室内场景,3DMATCH数据集上的注册结果表明,我们的方法优于最先进的传统和基于学习的方法。对于Kitti户外场景,我们的方法仍然能够降低转换错误。我们还在交叉数据集中探讨其强大的泛化能力。
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform.
translated by 谷歌翻译
With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.
translated by 谷歌翻译
Because of the necessity to obtain high-quality images with minimal radiation doses, such as in low-field magnetic resonance imaging, super-resolution reconstruction in medical imaging has become more popular (MRI). However, due to the complexity and high aesthetic requirements of medical imaging, image super-resolution reconstruction remains a difficult challenge. In this paper, we offer a deep learning-based strategy for reconstructing medical images from low resolutions utilizing Transformer and Generative Adversarial Networks (T-GAN). The integrated system can extract more precise texture information and focus more on important locations through global image matching after successfully inserting Transformer into the generative adversarial network for picture reconstruction. Furthermore, we weighted the combination of content loss, adversarial loss, and adversarial feature loss as the final multi-task loss function during the training of our proposed model T-GAN. In comparison to established measures like PSNR and SSIM, our suggested T-GAN achieves optimal performance and recovers more texture features in super-resolution reconstruction of MRI scanned images of the knees and belly.
translated by 谷歌翻译
In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. Different from most existing NeRF methods that are based on multiple views, monocular videos only contain one view at each timestamp, thereby suffering from ambiguity along the view direction in estimating point features and scene flows. Previous studies such as DynNeRF disambiguate point features by positional encoding, which is not transferable and severely limits the generalization ability. As a result, these methods have to train one independent model for each scene and suffer from heavy computational costs when applying to increasing monocular videos in real-world applications. To address this, We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames. More specifically, we learn an implicit velocity field to estimate point trajectory from temporal features with Neural ODE, which is followed by a flow-based feature aggregation module to obtain spatial features along the point trajectory. We jointly optimize temporal and spatial features by training the network in an end-to-end manner. Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
translated by 谷歌翻译